An unsupervised approach to creating web audio contents-based HMM voices
نویسندگان
چکیده
This paper presents an approach toward rapid creation of varied synthetic voices at low cost. This consists of amassing audio web contents, extracting usable speech from them, further transcribing the speech to surface text and performing phone-time alignment, and using the speech and transcripts to build HMMbased voices. A set of experiments is conducted to evaluate this approach. The results indicate that: large volumes of audio content are available on the internet, in which more than 33.3% of web radio data are unusable for building voices due to noise, music, and the speaker’s overlapping. Among the 14 voices built from limited radio monologues in Japanese, there are three fair (middle of the five-point scale) voices but two voices are bad (the lowest level). The influence of erroneous transcripts on voice quality is significant. In order to achieve fair voice quality with limited speech data, the phone and word accuracy of speech transcriptions must be higher than 80% and 50%, respectively.
منابع مشابه
An investigation of the impact of speech transcript errors on HMM voices
Toward automatic creation of web-based voice fonts at low cost, automatic speech transcription technology is used to obtain the linguistic features for building HMM-based voices from audio web contents. This paper presents an investigation of the influences of erroneous transcripts on such voices. We simulate varied speech transcript errors by using a large vocabulary automatic speech recognize...
متن کاملSome Aspects of ASR Transcription Based Unsupervised Speaker Adaptation for HMM Speech Synthesis
Statistical parametric synthesis offers numerous techniques to create new voices. Speaker adaptation is one of the most exciting ones. However, it still requires high quality audio data with low signal to noise ration and precise labeling. This paper presents an automatic speech recognition based unsupervised adaptation method for Hidden Markov Model (HMM) speech synthesis and its quality evalu...
متن کاملExplorer Unsupervised cross - lingual speaker adaptation for HMM - based speech synthesis
In the EMIME project, we are developing a mobile device that performs personalized speech-to-speech translation such that a user’s spoken input in one language is used to produce spoken output in another language, while continuing to sound like the user’s voice. We integrate two techniques, unsupervised adaptation for HMM-based TTS using a wordbased large-vocabulary continuous speech recognizer...
متن کاملUnsupervised onset detection: A probabilistic approach using ICA and a hidden Markov classifier
We describe an onset detection system that takes a twostage approach, both of which are based on unsupervised learning in a probabilistic model. The first stage uses independent component analysis (ICA) to fit a short-term non-Gaussian model to frames of audio data. This model is used to generate a reduced signal to be interpreted as the ‘surprisingness’ of the original audio signal. Our hypoth...
متن کاملAnalysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping
In the EMIME project, we developed a mobile device that performs personalized speech-to-speech translation such that a user’s spoken input in one language is used to produce spoken output in another language, while continuing to sound like the user’s voice. We integrated two techniques into a single architecture: unsupervised adaptation for HMM-based TTS using word-based large-vocabulary contin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010